class: center, middle, inverse, title-slide .title[ # Databases with R ] .subtitle[ ##
https://pos.it/databases_24
] .author[ ### Kirill Müller ] .author[ ### cynkra GmbH ] .date[ ### August 12, 2024 ] --- <style type="text/css"> .pull-left { margin-top: -25px; } .pull-right { margin-top: -25px; } .r.hljs.remark-code { font-size: 20px; } .remark-code { font-size: 20px; } pre { white-space: pre-wrap; /* css-3 */ white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ word-wrap: break-word; /* Internet Explorer 5.5+ */ margin-bottom: 0px; } .remark-slide-content { padding-top: 0px; padding-bottom: 0px; } .remark-slide-scaler { overflow-y: auto; } .font17 { font-size: 17px; } .font14 { font-size: 14px; } .script-number-font { color: inherit !important; opacity: 0.9 !important; transform: scale(1.8); transform-origin: top left; } .script-number { color: inherit !important; opacity: 0.9 !important; position: absolute; top: 12px; right: 20px; transform: scale(1.8); transform-origin: top right; } .exercise-timer { position: fixed; bottom: 12px; left: 20px; transform: scale(1.8); transform-origin: bottom left; } .exercise-timer img { max-width: none; /* Prevent the image from being constrained by the container */ } </style> # https://pos.it/databases_24 .right[Welcome to posit::conf(2024)] - WiFi: `Posit Conf 2024` Password: `conf2024` - There are gender-neutral bathrooms located on levels 3, 4, 5, 6 & 7 - There is a meditation/prayer room is located in 503. Available Mon & Tues 7am - 7pm, and Wed 7am - 5pm. - The lactation room is located in 509, same timings as above. - Participants who do not wish to be photographed have red lanyards; please note everyone’s lanyard colors before taking a photo and respect their choices. - The Code of Conduct can be found at https://posit.co/code-of-conduct. Please review them carefully. You can report Code of Conduct violations in person, by email, or by phone. Please see the policy linked above for contact information. --- # https://pos.it/databases_24 .right[Schedule] | Time | Activity | | :------------ | :--------------- | | 09:00 - 10:30 | Talking to the database | | 10:30 - 11:00 | *Coffee break* | | 11:00 - 12:30 | Working with files | | 12:30 - 13:30 | *Lunch break* | | 13:30 - 15:00 | Digging in deeper | | 15:00 - 15:30 | *Coffee break* | | 15:30 - 17:00 | Exercises - Bring your own data | --- # https://pos.it/databases_24 .right[General remarks] - This course is interactive: go ahead and ask if anything is unclear! - More extended Q&A in the last section of the workshop - Discord channel: *#workshop-databases* - Join through <https://pos.it/conf-event-portal> --- .pull-left[ # Roles gravitating around a database - Database researcher / developer - Database operator / administrator - Data engineer - **Analytical engineer** - **Data scientist** - **Data analyst** ] .pull-right[ # Exercise Talk to your neighbors. - Which of the above-mentioned roles do you cover in your organization? - What do you expect to take home from the workshop? ] <div class="exercise-timer"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="84" height="20" role="img" aria-label="Timer: 05:00"><title>Timer: 05:00</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="84" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="41" height="20" fill="#555"/><rect x="41" width="43" height="20" fill="#007ec6"/><rect width="84" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="215" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="310">Timer</text><text x="215" y="140" transform="scale(.1)" fill="#fff" textLength="310">Timer</text><text aria-hidden="true" x="615" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">05:00</text><text x="615" y="140" transform="scale(.1)" fill="#fff" textLength="330">05:00</text></g></svg></div> --- # https://pos.it/databases_24 .right[Course material] ```r # install.packages("usethis") usethis::use_course("posit-conf-2024/databases") # Slides for offline use: index.html # Materials: materials/ ``` posit.cloud: <https://posit.cloud/spaces/537049> GitHub: <https://github.com/posit-conf-2024/databases> --- background-image: url("images/10.webp") background-size: contain background-position: 100% 100% # First steps .pull-left[ - Connect - Discover <div class="script-number-font"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_10.R"><title>Script: db_10.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_10.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_10.R</text></g></svg></div> ``` r library(DBI) ``` ] --- background-image: url("images/10-frame.webp") background-size: contain background-position: 100% 100% # First steps .pull-left[ - Connect - Discover <div class="script-number-font"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_10.R"><title>Script: db_10.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_10.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_10.R</text></g></svg></div> ``` r library(tidyverse) library(DBI) ``` ] --- # Connect to the database First step when accessing the database. ``` r con <- dbConnect(duckdb::duckdb()) con ``` ``` <duckdb_connection 0faa0 driver=<duckdb_driver 22170 dbdir=':memory:' read_only=FALSE>> ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_10.R"><title>Script: db_10.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_10.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_10.R</text></g></svg></div> --- # Discover tables Where is my data? ``` r dbListTables(con) ``` ``` character(0) ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_10.R"><title>Script: db_10.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_10.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_10.R</text></g></svg></div> --- background-image: url("data:image/png;base64,#images/magic-hat-and-wand-nath-r.svg") background-size: 25% background-position: 100% 100% # Prepare database Normally done by database administrators and data engineers. ``` r dm::copy_dm_to( con, dm::dm_pixarfilms(), set_key_constraints = FALSE, temporary = FALSE ) ``` ``` Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable', target signature 'duckdb_connection#Id'. "duckdb_connection#ANY" would also be valid ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_10.R"><title>Script: db_10.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_10.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_10.R</text></g></svg></div> --- # Discover tables Where is my data? .pull-left[ ``` r dbListTables(con) ``` ``` [1] "academy" "box_office" [3] "genres" "pixar_films" [5] "pixar_people" "public_response" ``` ] .pull-right[ ``` r dbListFields(con, "box_office") ``` ``` [1] "film" [2] "budget" [3] "box_office_us_canada" [4] "box_office_other" [5] "box_office_worldwide" ``` ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_10.R"><title>Script: db_10.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_10.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_10.R</text></g></svg></div> --- background-image: url("images/10-frame.webp") background-size: contain background-position: 100% 100% # First steps: Exercises .pull-left[ 1. List all columns from the `pixar_films` table. 2. Review the help for `dbListFields()` and `dbListTables()`, and the index on <https://dbi.r-dbi.org/reference/>. ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_10.R"><title>Script: db_10.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_10.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_10.R</text></g></svg></div> <div class="exercise-timer"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="84" height="20" role="img" aria-label="Timer: 05:00"><title>Timer: 05:00</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="84" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="41" height="20" fill="#555"/><rect x="41" width="43" height="20" fill="#007ec6"/><rect width="84" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="215" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="310">Timer</text><text x="215" y="140" transform="scale(.1)" fill="#fff" textLength="310">Timer</text><text aria-hidden="true" x="615" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">05:00</text><text x="615" y="140" transform="scale(.1)" fill="#fff" textLength="330">05:00</text></g></svg></div> --- # Learning goals <img src="data:image/png;base64,#databases_files/figure-html/fig-learn-base-1.png" style="display: block; margin: auto;" /> --- # Learning goals <img src="data:image/png;base64,#databases_files/figure-html/fig-learn-most-1.png" style="display: block; margin: auto;" /> --- # Learning goals <img src="data:image/png;base64,#databases_files/figure-html/fig-learn-all-1.png" style="display: block; margin: auto;" /> --- .pull-left[ # Database Set of concepts: - relationships - data integrity - data redundancy - data control ] .pull-right[ # DBMS Software that embodies those concepts. > A Database Management System (DBMS) is a software system that uses a standard method to store and organize data. It allows users to create, retrieve, update, and manage data efficiently and provides tools for ensuring data consistency, integrity, and security. ] .center[These two terms are typically used interchangeably despite the differences.] --- # Database types .pull-left[ ## Relational (SQL database) or RDBMS Tables - Microsoft SQL Server - Oracle - PostgreSQL - MySQL / MariaDB - Exasol / Clickhouse - **duckdb** ] .pull-right[ ## Non-relational (NoSQL database) Key-value, document, graph, ... - MongoDB - Redis - Amazon DynamoDB ] .center[Server-based, can be found in "corporate" settings.] --- .pull-left[ # Server-based - It requires a separate server and dedicated resources (RAM, CPU, memory and storage) - A client (you / your application) access it over a network - Suitable for multi-user access (e.g. a "corporate" setting) - Deployment requires knowledge and expertise - Requires administration ] .pull-right[ # Embedded - It runs as part of the application process (e.g., in the R session of your laptop) - Does not require a separate database server - Minimal setup and configuration - **Handy in a workshop setting!** ] --- ## Lightweight, embedded and easy-to-configure RDBMS .pull-left[ # SQLite (2000) - Designed for embedded use in applications: mobile apps, web browser - "Most widely deployed and used database engine" ] .pull-right[ # DuckDB (2018) - Tailored towards heavy analytical workloads and data science applications - **"The SQLite for analytics"** ] --- .pull-left[ # Row data storage - SQLite - Good for transactional processing - Each row represented in a table is stored as an array on disk Easy to update individual rows. ] .pull-right[ # Columnar data storage - DuckDB - Good for analytics - Each column is stored as a chunked array on disk and in memory - Memory locality - Column compression **Faster queries for reading.** ] --- class: middle <img src="data:image/png;base64,#images/direct.svg" width="82%" height="82%" style="display: block; margin: auto;" /> --- background-image: url("data:image/png;base64,#images/11.webp") background-size: contain background-position: 100% 100% # Read whole tables .pull-left[ - {DBI} package - Read - Query <div class="script-number-font"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> ``` r library(DBI) ``` ] --- background-image: url("data:image/png;base64,#images/11-frame.webp") background-size: contain background-position: 100% 100% # Read whole tables .pull-left[ - {DBI} package - Read - Query <div class="script-number-font"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> ``` r library(tidyverse) library(DBI) ``` ] --- # Connect to the database First step when accessing the database. ```r con <- dbConnect(duckdb::duckdb()) con ``` ``` <duckdb_connection 0faa0 driver=<duckdb_driver 22170 dbdir=':memory:' read_only=FALSE>> ``` ## Caveat: server, credentials, ... ```r dbConnect( odbc::odbc(), # Alternatively: RPostgres::Postgres(), RPostgres::Redshift(), ... dsn = "...", # driver = "...", server = "...", uid = "...", pwd = "...", ... ) ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- # Discover tables Where is my data? .pull-left[ ``` r dbListTables(con) ``` ``` [1] "academy" "box_office" [3] "genres" "pixar_films" [5] "pixar_people" "public_response" ``` ] .pull-right[ ``` r dbListFields(con, "box_office") ``` ``` [1] "film" [2] "budget" [3] "box_office_us_canada" [4] "box_office_other" [5] "box_office_worldwide" ``` ] ## Caveat: schemas, catalogs, ... ```sql SELECT * FROM INFORMATION_SCHEMA.TABLES ``` ```r dbListFields(con, SQL("datasets.movies.box_office")) ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- # Read tables Read entire tables into your local session, if you can afford it. ``` r df_pixar_films <- dbReadTable(con, "pixar_films") df_pixar_films ``` ``` number film release_date run_time film_rating 1 1 Toy Story 1995-11-22 81 G 2 2 A Bug's Life 1998-11-25 95 G 3 3 Toy Story 2 1999-11-24 92 G 4 4 Monsters, Inc. 2001-11-02 92 G 5 5 Finding Nemo 2003-05-30 100 G 6 6 The Incredibles 2004-11-05 115 PG 7 7 Cars 2006-06-09 117 G 8 8 Ratatouille 2007-06-29 111 G 9 9 WALL-E 2008-06-27 98 G 10 10 Up 2009-05-29 96 PG 11 11 Toy Story 3 2010-06-18 103 G 12 12 Cars 2 2011-06-24 106 G 13 13 Brave 2012-06-22 93 PG 14 14 Monsters University 2013-06-21 104 G 15 15 Inside Out 2015-06-19 95 PG 16 16 The Good Dinosaur 2015-11-25 93 PG 17 17 Finding Dory 2016-06-17 97 PG 18 18 Cars 3 2017-06-16 102 G 19 19 Coco 2017-11-22 105 PG 20 20 Incredibles 2 2018-06-15 118 PG [ reached 'max' / getOption("max.print") -- omitted 7 rows ] ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- # Read tables Use `as_tibble()` to convert to a tibble for better display and more robust operation. ``` r df_pixar_films <- dbReadTable(con, "pixar_films") as_tibble(df_pixar_films) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 5</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='font-weight: bold;'>run_time</span> <span style='font-weight: bold;'>film_rating</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BCBCBC;'>2</span> 2 A Bug's Life 1998-11-25 95 G <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BCBCBC;'>4</span> 4 Monsters, Inc. 2001-11-02 92 G <span style='color: #BCBCBC;'>5</span> 5 Finding Nemo 2003-05-30 100 G <span style='color: #BCBCBC;'>6</span> 6 The Incredibles 2004-11-05 115 PG <span style='color: #949494;'># ℹ 21 more rows</span> </CODE></PRE> <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- # Execute queries Write SQL code to define what data you want to see. ``` r dbGetQuery(con, "SELECT * FROM pixar_films") ``` ``` number film release_date run_time film_rating 1 1 Toy Story 1995-11-22 81 G 2 2 A Bug's Life 1998-11-25 95 G 3 3 Toy Story 2 1999-11-24 92 G 4 4 Monsters, Inc. 2001-11-02 92 G 5 5 Finding Nemo 2003-05-30 100 G 6 6 The Incredibles 2004-11-05 115 PG 7 7 Cars 2006-06-09 117 G 8 8 Ratatouille 2007-06-29 111 G 9 9 WALL-E 2008-06-27 98 G 10 10 Up 2009-05-29 96 PG 11 11 Toy Story 3 2010-06-18 103 G 12 12 Cars 2 2011-06-24 106 G 13 13 Brave 2012-06-22 93 PG 14 14 Monsters University 2013-06-21 104 G 15 15 Inside Out 2015-06-19 95 PG 16 16 The Good Dinosaur 2015-11-25 93 PG 17 17 Finding Dory 2016-06-17 97 PG 18 18 Cars 3 2017-06-16 102 G 19 19 Coco 2017-11-22 105 PG 20 20 Incredibles 2 2018-06-15 118 PG [ reached 'max' / getOption("max.print") -- omitted 7 rows ] ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- # Execute queries Write complex SQL code to define what data you want to see. ``` r sql <- " SELECT * FROM pixar_films WHERE release_date >= '2020-01-01' " ``` ``` r dbGetQuery(con, sql) ``` ``` number film release_date run_time film_rating 1 22 Onward 2020-03-06 102 PG 2 23 Soul 2020-12-25 100 PG 3 24 Luca 2021-06-18 151 N/A 4 25 Turning Red 2022-03-11 NA N/A 5 26 Lightyear 2022-06-17 NA N/A 6 27 <NA> 2023-06-16 155 Not Rated ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- # Execute queries R 4.0 or later: use new-style string literals for mixing quotes. ``` r sql <- r"( SELECT * FROM "pixar_films" WHERE "release_date" >= '2020-01-01' )" ``` ``` r dbGetQuery(con, sql) ``` ``` number film release_date run_time film_rating 1 22 Onward 2020-03-06 102 PG 2 23 Soul 2020-12-25 100 PG 3 24 Luca 2021-06-18 151 N/A 4 25 Turning Red 2022-03-11 NA N/A 5 26 Lightyear 2022-06-17 NA N/A 6 27 <NA> 2023-06-16 155 Not Rated ``` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- background-image: url("data:image/png;base64,#images/magic-hat-and-wand-nath-r.svg") background-size: 25% background-position: 100% 100% # Further pointers .pull-left[ ## Quoting ``` r dbQuoteIdentifier(con, "from") ``` ``` <SQL> "from" ``` ``` r dbQuoteLiteral(con, "Toy Story") ``` ``` <SQL> 'Toy Story' ``` ``` r dbQuoteLiteral(con, as.Date("2020-01-01")) ``` ``` <SQL> '2020-01-01'::date ``` ``` r glue::glue_sql(...) ``` ] .pull-right[ ## Parameterized queries ``` r sql <- "SELECT count(*) FROM pixar_films WHERE release_date >= ?" ``` ``` r dbGetQuery(con, sql, params = list(as.Date("2020-01-01")) ) ``` ``` count_star() 1 6 ``` ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> --- background-image: url("data:image/png;base64,#images/11-frame.webp") background-size: contain background-position: 100% 100% # Read whole tables: Exercises .pull-left[ 1. Read the `academy` table. 2. Read all records from the `academy` table that correspond to awards won - Hint: `"SELECT * FROM academy WHERE status = 'Won'"` 3. Use quoting and/or a query parameter to make the previous query more robust. - Hint: `sql <- paste0("SELECT * FROM academy WHERE ", quoted_column, " = ?")` ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_11.R"><title>Script: db_11.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_11.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_11.R</text></g></svg></div> <div class="exercise-timer"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="84" height="20" role="img" aria-label="Timer: 05:00"><title>Timer: 05:00</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="84" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="41" height="20" fill="#555"/><rect x="41" width="43" height="20" fill="#007ec6"/><rect width="84" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="215" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="310">Timer</text><text x="215" y="140" transform="scale(.1)" fill="#fff" textLength="310">Timer</text><text aria-hidden="true" x="615" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">05:00</text><text x="615" y="140" transform="scale(.1)" fill="#fff" textLength="330">05:00</text></g></svg></div> --- <img src="data:image/png;base64,#images/connecting-dots-simple.svg" width="82%" height="82%" style="display: block; margin: auto;" /> --- background-image: url("data:image/png;base64,#images/12.webp") background-size: 40% background-position: 100% 100% # Downsizing on the database .pull-left[ - {dbplyr} package (part of the tidyverse) - Lazy tables - `collect()` - `select()` - `filter()` - `count()` <div class="script-number-font"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> ``` r library(tidyverse) ``` ] --- background-image: url("data:image/png;base64,#images/12-frame.webp") background-size: 40% background-position: 100% 100% # Downsizing on the database .pull-left[ - {dbplyr} package (part of the tidyverse) - Lazy tables - `collect()` - `select()` - `filter()` - `count()` <div class="script-number-font"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> ``` r library(tidyverse) ``` ] --- # Lazy tables A pointer to a SQL table. The data is still on the database! ``` r pixar_films <- tbl(con, "pixar_films") pixar_films ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: table<pixar_films> [?? x 5]</span> <span style='color: #949494;'># Database: DuckDB v1.0.0</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='font-weight: bold;'>run_time</span> <span style='font-weight: bold;'>film_rating</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BCBCBC;'>2</span> 2 A Bug's Life 1998-11-25 95 G <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BCBCBC;'>4</span> 4 Monsters, Inc. 2001-11-02 92 G <span style='color: #BCBCBC;'>5</span> 5 Finding Nemo 2003-05-30 100 G <span style='color: #BCBCBC;'>6</span> 6 The Incredibles 2004-11-05 115 PG <span style='color: #949494;'># ℹ more rows</span> </CODE></PRE> <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Read the whole table ``` r df_pixar_films <- pixar_films |> collect() df_pixar_films ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 5</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='font-weight: bold;'>run_time</span> <span style='font-weight: bold;'>film_rating</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BCBCBC;'>2</span> 2 A Bug's Life 1998-11-25 95 G <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BCBCBC;'>4</span> 4 Monsters, Inc. 2001-11-02 92 G <span style='color: #BCBCBC;'>5</span> 5 Finding Nemo 2003-05-30 100 G <span style='color: #BCBCBC;'>6</span> 6 The Incredibles 2004-11-05 115 PG <span style='color: #949494;'># ℹ 21 more rows</span> </CODE></PRE> <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Select columns With `select()`, like with data frames. .pull-left[ ``` r pixar_films |> select(1:3) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: SQL [?? x 3]</span> <span style='color: #949494;'># Database: DuckDB v1.0.0</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 <span style='color: #BCBCBC;'>2</span> 2 A Bug's Life 1998-11-25 <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 <span style='color: #BCBCBC;'>4</span> 4 Monsters, Inc. 2001-11-02 <span style='color: #BCBCBC;'>5</span> 5 Finding Nemo 2003-05-30 <span style='color: #BCBCBC;'>6</span> 6 The Incredibles 2004-11-05 <span style='color: #949494;'># ℹ more rows</span> </CODE></PRE> ] .pull-right[ ## Under the hood ``` r pixar_films |> select(1:3) |> show_query() ``` <PRE class="fansi fansi-output"><CODE><SQL> <span style='color: #0000BB;'>SELECT</span> number, film, release_date <span style='color: #0000BB;'>FROM</span> pixar_films </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Select columns and read .pull-left[ ``` r df_pixar_films_3 <- pixar_films |> select(1:3) |> collect() df_pixar_films_3 ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 3</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_date</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 <span style='color: #BCBCBC;'>2</span> 2 A Bug's Life 1998-11-25 <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 <span style='color: #BCBCBC;'>4</span> 4 Monsters, Inc. 2001-11-02 <span style='color: #BCBCBC;'>5</span> 5 Finding Nemo 2003-05-30 <span style='color: #BCBCBC;'>6</span> 6 The Incredibles 2004-11-05 <span style='color: #949494;'># ℹ 21 more rows</span> </CODE></PRE> ] .pull-right[ ## Data on the database not affected ``` r pixar_films |> collect() ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 5</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_…¹</span> <span style='font-weight: bold;'>run_t…²</span> <span style='font-weight: bold;'>film_…³</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BCBCBC;'>2</span> 2 A Bug's Li… 1998-11-25 95 G <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BCBCBC;'>4</span> 4 Monsters, … 2001-11-02 92 G <span style='color: #BCBCBC;'>5</span> 5 Finding Ne… 2003-05-30 100 G <span style='color: #BCBCBC;'>6</span> 6 The Incred… 2004-11-05 115 PG <span style='color: #949494;'># ℹ 21 more rows</span> <span style='color: #949494;'># ℹ abbreviated names: ¹​release_date, …</span> </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Select rows With `filter()`, like with data frames. .pull-left[ ``` r pixar_films |> filter(release_date >= "2020-01-01") ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: SQL [6 x 5]</span> <span style='color: #949494;'># Database: DuckDB v1.0.0</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_…¹</span> <span style='font-weight: bold;'>run_t…²</span> <span style='font-weight: bold;'>film_…³</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 22 Onward 2020-03-06 102 PG <span style='color: #BCBCBC;'>2</span> 23 Soul 2020-12-25 100 PG <span style='color: #BCBCBC;'>3</span> 24 Luca 2021-06-18 151 N/A <span style='color: #BCBCBC;'>4</span> 25 Turning Red 2022-03-11 <span style='color: #BB0000;'>NA</span> N/A <span style='color: #BCBCBC;'>5</span> 26 Lightyear 2022-06-17 <span style='color: #BB0000;'>NA</span> N/A <span style='color: #BCBCBC;'>6</span> 27 <span style='color: #BB0000;'>NA</span> 2023-06-16 155 Not Ra… <span style='color: #949494;'># ℹ abbreviated names: ¹​release_date,</span> <span style='color: #949494;'># ²​run_time, ³​film_rating</span> </CODE></PRE> ] .pull-right[ ## Under the hood ``` r pixar_films |> filter(release_date >= "2020-01-01") |> show_query() ``` <PRE class="fansi fansi-output"><CODE><SQL> <span style='color: #0000BB;'>SELECT</span> pixar_films.* <span style='color: #0000BB;'>FROM</span> pixar_films <span style='color: #0000BB;'>WHERE</span> (release_date >= '2020-01-01') </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Select rows and read .pull-left[ ``` r df_pixar_films_202x <- pixar_films |> filter(release_date >= "2020-01-01") |> collect() df_pixar_films_202x ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 6 × 5</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_…¹</span> <span style='font-weight: bold;'>run_t…²</span> <span style='font-weight: bold;'>film_…³</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 22 Onward 2020-03-06 102 PG <span style='color: #BCBCBC;'>2</span> 23 Soul 2020-12-25 100 PG <span style='color: #BCBCBC;'>3</span> 24 Luca 2021-06-18 151 N/A <span style='color: #BCBCBC;'>4</span> 25 Turning Red 2022-03-11 <span style='color: #BB0000;'>NA</span> N/A <span style='color: #BCBCBC;'>5</span> 26 Lightyear 2022-06-17 <span style='color: #BB0000;'>NA</span> N/A <span style='color: #BCBCBC;'>6</span> 27 <span style='color: #BB0000;'>NA</span> 2023-06-16 155 Not Ra… <span style='color: #949494;'># ℹ abbreviated names: ¹​release_date,</span> <span style='color: #949494;'># ²​run_time, ³​film_rating</span> </CODE></PRE> ] .pull-right[ ## Data on the database not affected ``` r pixar_films |> collect() ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 5</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_…¹</span> <span style='font-weight: bold;'>run_t…²</span> <span style='font-weight: bold;'>film_…³</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BCBCBC;'>2</span> 2 A Bug's Li… 1998-11-25 95 G <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BCBCBC;'>4</span> 4 Monsters, … 2001-11-02 92 G <span style='color: #BCBCBC;'>5</span> 5 Finding Ne… 2003-05-30 100 G <span style='color: #BCBCBC;'>6</span> 6 The Incred… 2004-11-05 115 PG <span style='color: #949494;'># ℹ 21 more rows</span> <span style='color: #949494;'># ℹ abbreviated names: ¹​release_date, …</span> </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Aggregate .pull-left[ - With `summarize()` - Use `.by` for grouped operations. ``` r pixar_films |> summarize(.by = film_rating, n = n()) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: SQL [4 x 2]</span> <span style='color: #949494;'># Database: DuckDB v1.0.0</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> Not Rated 1 <span style='color: #BCBCBC;'>2</span> N/A 3 <span style='color: #BCBCBC;'>3</span> G 13 <span style='color: #BCBCBC;'>4</span> PG 10 </CODE></PRE> ] .pull-right[ Old syntax: ``` r pixar_films %>% group_by(film_rating) %>% summarize(n = n()) %>% ungroup() ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: SQL [4 x 2]</span> <span style='color: #949494;'># Database: DuckDB v1.0.0</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> Not Rated 1 <span style='color: #BCBCBC;'>2</span> PG 10 <span style='color: #BCBCBC;'>3</span> N/A 3 <span style='color: #BCBCBC;'>4</span> G 13 </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Aggregate .pull-left[ ``` r pixar_films |> summarize(.by = film_rating, n = n()) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: SQL [4 x 2]</span> <span style='color: #949494;'># Database: DuckDB v1.0.0</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> G 13 <span style='color: #BCBCBC;'>2</span> PG 10 <span style='color: #BCBCBC;'>3</span> N/A 3 <span style='color: #BCBCBC;'>4</span> Not Rated 1 </CODE></PRE> ] .pull-right[ ## Under the hood ``` r pixar_films |> summarize(.by = film_rating, n = n()) |> show_query() ``` <PRE class="fansi fansi-output"><CODE><SQL> <span style='color: #0000BB;'>SELECT</span> film_rating, COUNT(*)<span style='color: #0000BB;'> AS </span>n <span style='color: #0000BB;'>FROM</span> pixar_films <span style='color: #0000BB;'>GROUP BY</span> film_rating </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Aggregate With `count()`, like with data frames. .pull-left[ ``` r pixar_films |> # # count(film_rating) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># Source: SQL [4 x 2]</span> <span style='color: #949494;'># Database: DuckDB v1.0.0</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> G 13 <span style='color: #BCBCBC;'>2</span> Not Rated 1 <span style='color: #BCBCBC;'>3</span> PG 10 <span style='color: #BCBCBC;'>4</span> N/A 3 </CODE></PRE> ] .pull-right[ ## Under the hood ``` r pixar_films |> # # count(film_rating) |> show_query() ``` <PRE class="fansi fansi-output"><CODE><SQL> <span style='color: #0000BB;'>SELECT</span> film_rating, COUNT(*)<span style='color: #0000BB;'> AS </span>n <span style='color: #0000BB;'>FROM</span> pixar_films <span style='color: #0000BB;'>GROUP BY</span> film_rating </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- # Aggregate and read .pull-left[ ``` r df_pixar_films_by_rating <- pixar_films |> count(film_rating) |> collect() df_pixar_films_by_rating ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 4 × 2</span> <span style='font-weight: bold;'>film_rating</span> <span style='font-weight: bold;'>n</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #BCBCBC;'>1</span> Not Rated 1 <span style='color: #BCBCBC;'>2</span> PG 10 <span style='color: #BCBCBC;'>3</span> G 13 <span style='color: #BCBCBC;'>4</span> N/A 3 </CODE></PRE> ] .pull-right[ ## Data on the database not affected ``` r pixar_films |> collect() ``` <PRE class="fansi fansi-output"><CODE><span style='color: #949494;'># A tibble: 27 × 5</span> <span style='font-weight: bold;'>number</span> <span style='font-weight: bold;'>film</span> <span style='font-weight: bold;'>release_…¹</span> <span style='font-weight: bold;'>run_t…²</span> <span style='font-weight: bold;'>film_…³</span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #949494; font-style: italic;'><date></span> <span style='color: #949494; font-style: italic;'><dbl></span> <span style='color: #949494; font-style: italic;'><chr></span> <span style='color: #BCBCBC;'>1</span> 1 Toy Story 1995-11-22 81 G <span style='color: #BCBCBC;'>2</span> 2 A Bug's Li… 1998-11-25 95 G <span style='color: #BCBCBC;'>3</span> 3 Toy Story 2 1999-11-24 92 G <span style='color: #BCBCBC;'>4</span> 4 Monsters, … 2001-11-02 92 G <span style='color: #BCBCBC;'>5</span> 5 Finding Ne… 2003-05-30 100 G <span style='color: #BCBCBC;'>6</span> 6 The Incred… 2004-11-05 115 PG <span style='color: #949494;'># ℹ 21 more rows</span> <span style='color: #949494;'># ℹ abbreviated names: ¹​release_date, …</span> </CODE></PRE> ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> --- background-image: url("data:image/png;base64,#images/12-frame.webp") background-size: 40% background-position: 100% 100% # Downsizing on the database: Exercises 1 * Find several ways to select the 3 first columns * What happens if you include the name of a variable multiple times in a `select()` call? * Select all columns that contain underscores (use `contains()`) * Use `all_of()` to select 2 columns of your choice <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> <div class="exercise-timer"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="84" height="20" role="img" aria-label="Timer: 15:00"><title>Timer: 15:00</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="84" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="41" height="20" fill="#555"/><rect x="41" width="43" height="20" fill="#007ec6"/><rect width="84" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="215" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="310">Timer</text><text x="215" y="140" transform="scale(.1)" fill="#fff" textLength="310">Timer</text><text aria-hidden="true" x="615" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">15:00</text><text x="615" y="140" transform="scale(.1)" fill="#fff" textLength="330">15:00</text></g></svg></div> --- background-image: url("data:image/png;base64,#images/12-frame.webp") background-size: 40% background-position: 100% 100% # Downsizing on the database: Exercises 2 .pull-left[ Find all films that 1. Are rated "PG" 2. Had a run time below 95 3. Had a rating of "N/A" or "Not Rated" 4. Were released after and including year 2020 5. Have a missing name (`film` column) or `run_time` 6. Are a first sequel (the name ends with "2") - Hint: Bring the data into the R session before filtering ] <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> <div class="exercise-timer"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="84" height="20" role="img" aria-label="Timer: 15:00"><title>Timer: 15:00</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="84" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="41" height="20" fill="#555"/><rect x="41" width="43" height="20" fill="#007ec6"/><rect width="84" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="215" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="310">Timer</text><text x="215" y="140" transform="scale(.1)" fill="#fff" textLength="310">Timer</text><text aria-hidden="true" x="615" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">15:00</text><text x="615" y="140" transform="scale(.1)" fill="#fff" textLength="330">15:00</text></g></svg></div> --- background-image: url("data:image/png;base64,#images/12-frame.webp") background-size: 40% background-position: 100% 100% # Downsizing on the database: Exercises 3 1. How many films are stored in the table? 2. How many films released after 2005 are stored in the table? 3. What is the total run time of all films? - Hint: Use `summarize(sum(...))`, watch out for the warning 4. What is the total run time of all films, per rating? - Hint: Use `.by` <div class="script-number"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="20" role="img" aria-label="Script: db_12.R"><title>Script: db_12.R</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="100" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="43" height="20" fill="#555"/><rect x="43" width="57" height="20" fill="#fe7d37"/><rect width="100" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="225" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">Script</text><text x="225" y="140" transform="scale(.1)" fill="#fff" textLength="330">Script</text><text aria-hidden="true" x="705" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="470">db_12.R</text><text x="705" y="140" transform="scale(.1)" fill="#fff" textLength="470">db_12.R</text></g></svg></div> <div class="exercise-timer"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="84" height="20" role="img" aria-label="Timer: 15:00"><title>Timer: 15:00</title><linearGradient id="s" x2="0" y2="100%"><stop offset="0" stop-color="#bbb" stop-opacity=".1"/><stop offset="1" stop-opacity=".1"/></linearGradient><clipPath id="r"><rect width="84" height="20" rx="3" fill="#fff"/></clipPath><g clip-path="url(#r)"><rect width="41" height="20" fill="#555"/><rect x="41" width="43" height="20" fill="#007ec6"/><rect width="84" height="20" fill="url(#s)"/></g><g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" text-rendering="geometricPrecision" font-size="110"><text aria-hidden="true" x="215" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="310">Timer</text><text x="215" y="140" transform="scale(.1)" fill="#fff" textLength="310">Timer</text><text aria-hidden="true" x="615" y="150" fill="#010101" fill-opacity=".3" transform="scale(.1)" textLength="330">15:00</text><text x="615" y="140" transform="scale(.1)" fill="#fff" textLength="330">15:00</text></g></svg></div> ---